Practical Load Balancing for Content Requests 
in Peer-to-Peer Networks 
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Abstract — This paper studies the problem of load- 
balancing the demand for content in a peer-to-peer 
network across heterogeneous peer nodes that hold 
replicas of the content. Previous decentralized load 
balancing techniques in distributed systems base 
their decisions on periodic updates containing infor- 
mation about load or available capacity observed at 
the serving entities. We show that these techniques 
do not work well in the peer-to-peer context; either 
they do not address peer node heterogeneity, or they 
suffer from significant load oscillations. We propose a 
new decentralized algorithm, Max-Cap, based on the 
maximum inherent capacities of the replica nodes and 
show that unlike previous algorithms, it is not tied to 
the timeUness or frequency of updates. Yet, Max-Cap 
can handle the heterogeneity of a peer-to-peer envi- 
ronment without suffering from load oscillations. 



I. Introduction 



Peer-to-peer networks are becoming a popular ar- 



chitecture for content distribution | |Ora01| ]. The ba- 
sic premise in such networks is that any one of a set 
of "repUca" nodes can provide the requested con- 
tent, increasing the availability of interesting con- 
tent without requiring the presence of any particular 
serving node. 

Many peer-to-peer networks push index en- 
tries throughout the overlay peer network in re- 



sponse to lookup queries for specific content [gnu], 
[ ^FH+01| ], I ^DoIl l, flSMK+OlQ , [ IZKJOlp . These 
index entries point to the locations of replica nodes 
where the particular content can be served, and are 
typically cached for a finite amount of time, after 



which they are considered stale. Until now, how- 
ever, there has been little focus on how an individual 
peer node should choose among the returned index 
entries to forward client requests. 

One reason for considering this choice is load 
balancing. Some replica nodes may have more ca- 
pacity to answer queries for content than others, 
and the system can serve content in a more timely 
manner by directing queries to more capable replica 
nodes. 

In this paper we explore the problem of load- 
balancing the demand for content in a peer-to-peer 
network. This problem is challenging for several 
reasons. First, in the peer-to-peer case there is 
no centralized dispatcher that performs the load- 
balancing of requests; each peer node individually 
makes its own decision on how to allocate incoming 
requests to replicas. Second, nodes do not typically 
know the identities of all other peer nodes in the net- 
work, and therefore they cannot coordinate this de- 
cision with those other nodes. Finally, replica nodes 
in peer-to-peer networks are not necessarily homo- 
geneous. Some replica nodes may be very powerful 
with great connectivity, whereas others may have 
limited inherent capacity to handle content requests. 

Previous load-balancing techniques in the litera- 
ture base their decisions on periodic or continuous 
updates containing information on load or avail- 
able capacity. We refer to this information as load- 
balancing information (LBI). These techniques have 
not been designed with peer-to-peer networks in 
mind and thus 



do not take into account the heterogeneity of 
peer nodes (e.g., [ |GCOO| l, [ |Mit97| ]), or 
use techniques such as migration or handoff of 
tasks that cannot be used in a peer-to-peer en- 
vironment (e.g., [^L96]), or 
suffer from significant load oscillations, or 



"herd behavior" [ |Mit97[ ], where peer nodes si- 
multaneously forward an unpredictable num- 
ber of requests to replicas with low reported 
load or high reported available capacity, caus- 
ing them to become overloaded. This herd 
behavior defeats the attempt to provide load- 
balancing. 

Most of these techniques also depend on the time- 
liness of LBI updates. The wide-aiea nature of peer- 
to-peer networks and the variation in transfer delays 
among peer nodes makes guaranteeing the timeli- 
ness of updates difficult. Peer nodes will experi- 
ence varying degrees of staleness in the LBI up- 
dates they receive depending on their distance from 
the source of updates. Moreover, maintaining the 
timeliness of LBI updates is also costly, since all 
updates must travel across the Internet to reach in- 
terested peer nodes. The smaller the inter-update 
period and the larger the overlay peer network, the 
greater the network traffic overhead incurred by LBI 
updates. Therefore, in a peer-to-peer environment, 
an effective load-balancing algorithm should not be 
critically dependent on the timeliness of updates. 

In this paper we propose a practical load- 
balancing algorithm, Max-Cap, that makes deci- 
sions based on the inherent maximum capacities of 
the replica nodes. We define maximum capacity as 
the maximum number of content requests per time 
unit that a replica claims it can handle. Alterna- 
tive measures such as maximum (allowed) connec- 
tions can be used. The maximum capacity is like 
a contract by which the replica agrees to abide. If 
the replica cannot sustain its advertised rate, then it 
may choose to advertise a new maximum capacity. 
Max-Cap is not critically tied to the timeliness or 
frequency of LBI updates, and as a result, when ap- 
plied in a peer-to-peer environment, outperforms al- 
gorithms based on load or available capacity, whose 
benefits are heavily dependent on the timeliness of 
the updates. 

We show that Max-Cap takes peer node hetero- 
geneity into account unlike algorithms based on 



load. While algorithms based on available capac- 
ity take heterogeneity into account, we show that 
they can suffer from load oscillations in a peer-to- 
peer network in the presence of small fluctuations in 
the workload even when the workload request rate 
is well below the total maximum capacities of the 
replicas. On the other hand, Max-Cap avoids over- 
loading replicas in such cases and is more resilient 
to very large fluctuations in workload. This is be- 
cause a key advantage of Max-Cap is that it uses 
information that is not affected by changes in the 
workload. 

Since it is most probable that each replica node 
will run other applications besides the peer-to-peer 
content distribution application, Max-Cap must also 
be able to handle fluctuations in "extraneous load" 
observed at the replicas. This is load caused by ex- 
ternal factors such as other applications the users of 
the replica node are running or network conditions 
occurring at the replica node. 

We modify Max-Cap to perform load-balancing 
using the "honored maximum capacity" of each 
replica. This is the maximum capacity minus the 
extraneous load observed at the replica. Although 
the honored maximum capacities may change fre- 
quently, the changes are independent of fluctuations 
in the content request workload. As a result, Max- 
Cap continues to provide better load-balancing than 
availability-based algorithms even when there are 
large fluctuations in the extraneous load. 

In a peer-to-peer environment the expectation is 
that the set of participating nodes changes con- 
stantly. Since replica arrivals to and departures from 
the peer network can affect the information carried 
in LBI updates, we also compare Max-Cap against 
availability-based algorithms when the set of repli- 
cas continuously changes. We show that Max-Cap 
is less affected by changes in the replica set than the 
availability-based algorithms. 

We evaluate load-based and availability-based 
algorithms and compare them with Max-Cap in 



the context of CUP |RB02], a protocol that asyn- 
chronously builds and maintains caches of index 
entries in peer-to-peer networks through Controlled 
Update Propagation. The index entries for a particu- 
lar content contain IP addresses that point to replica 
nodes serving the content. Load-balancing deci- 
sions are made from amongst these cached indices 
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to determine to which of the replica nodes a request 
for that content should be forwarded. CUP period- 
ically propagates updates of desired index entries 
down a conceptual tree (similar to an application- 
level multicast tree) whose vertices are interested 
peer nodes. We leverage CUP's propagation mecha- 
nism by piggybacking LBI such as load or available 
capacity onto the updates CUP propagates. 

The rest of this paper is organized as follows. 
Section || briefly describes the CUP protocol and 
how we use it to propagate the load-balancing in- 
formation necessary to implement the various load- 
balancing algorithms across replica nodes. Sec- 
tion m introduces the algorithms compared. Sec- 
tion 1^ presents experimental results showing that 
in a peer-to-peer environment, Max-Cap outper- 
forms the other algorithms with much less or no 
overhead. Section describes related work, and 



Section VI concludes the paper. 



II. CUP Protocol Design 

In this section we briefly describe how we leverage 
the CUP protocol to study the load-balancing prob- 
lem in a peer-to-peer context. CUP is a protocol for 
maintaining caches of index entries in peer-to-peer 
networks through Controlled ?7pdate Propagation. 

CUP supports both structured and unstructured 
networks. In structured networks lookup queries 
for particular content follow a well-defined path 
from the querying node toward an authority node, 
which is guaranteed to know the location of the con- 
tent within the network. In unstructured networks 
lookup queries are flooded haphazardly throughout 
the network until a node that knows the location 
of the content is met. In this paper, we will de- 
scribe how CUP works within structured networks 
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In CUP every node in the peer-to-peer network 
maintains two logical channels per neighbor: a 
query channel and an update channel. The query 
channel is used to forward lookup queries for con- 
tent of interest to the neighbor that is closest to the 
authority node for that content. The update channel 
is used to forward query responses asynchronously 
to a neighbor. These query responses contain sets 
of index entries that point to nodes holding the con- 
tent in question. The update channel is also used to 



update the index entries that are cached at the neigh- 
bor. 

Figure |l] shows a snapshot of CUP in progress in 
a network of seven nodes. The four logical chan- 
nels are shown between each pair of nodes. The 
left half of each node shows the set of content items 
for which the node is the authority. The right half 
shows the set of content items for which the node 
has cached index entries as a result of handling 
lookup queries. For example, node A is the author- 
ity node for content K3 and nodes C,D,E,F, and G 
have cached index entries for content K3. The pro- 
cess of querying and updating index entries for a 
particular content K forms a CUP tree whose root 
is the authority node for content K. The branches of 
the tree are formed by the paths traveled by lookup 
queries from other nodes in the network. For exam- 
ple, in Figure |TJ node A is the root of the CUP tree 
for K3 and branch {F,D,C,A} has grown as a result 
of a lookup query for K3 at node F. 

It is the authority node A for content K3 which is 
guaranteed to know the location of all nodes, called 
content replica nodes or simply replicas, that serve 
content K3. Replica nodes first send birth messages 
to authority A to indicate they are serving content 
K3. They may also send periodic refreshes or in- 
validation messages to A to indicate they are still 
serving or no longer serving the content. A then for- 
wards on any birth, refresh or invalidation messages 
it receives, which are propagated down the CUP tree 
to all interested nodes in the network. For example, 
in Figure |l] any update messages for index entries 
associated with content K3 that arrive at A from 
replica nodes are forwarded down the K3 CUP tree 
to C at level 1, D and E at level 2, and F and G at 
level 3. 



CUP has been extensively studied in [RB02|. 
While the specific update propagation protocol CUP 
uses has been shown to provide benefits such as 
greatly reducing the latency of lookup queries, the 
specific CUP protocol semantics are not required for 
the purposes of load-balancing. We simply lever- 
age the update propagation mechanism of CUP to 
push LBI such as replica load or capacity to inter- 
ested peer nodes throughout the overlay network. 
These peer nodes can then use this information 
when choosing to which replica a client request 
should be forwarded. 
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Fig. 1 
CUP Trees 



III. The Algorithms 

We evaluate two different algorithms, Inv-Load and 
Avail-Cap. Each is representative of a different class 
of algorithms that have been proposed in the dis- 
tributed systems literature. We study how these al- 
gorithms perform when applied in a peer-to-peer 
context and compare them with our proposed al- 
gorithm, Max-Cap. These three algorithms depend 
on different LBI being propagated, but their overall 
goal is the same: to balance the demand for content 
fairly across the set of replicas providing the con- 
tent. In particular, the algorithm should avoid over- 
loading some replicas while underloading others, 
especially when the aggregate capacity of all repli- 
cas is enough to handle the content request work- 
load. Moreover, the algorithm should prevent indi- 
vidual replicas from oscillating between being over- 
loaded and underloaded. 

Oscillation is undesirable for two reasons. First, 
many applications limit the number of requests a 
host can have outstanding. This means that when a 
replica node is overloaded, it will drop any request it 
receives. This forces the requesting client to resend 
its request which has a negative impact on response 
time. Even for applications that allow requests to 
be queued while a replica node is overloaded the 
queueing delay incurred will also increase the av- 
erage response time. Second, in a peer-to-peer net- 



work, the issue of fairness is sensitive. The owners 
of replica nodes are likely not to want their nodes to 
be overloaded while other nodes in the network are 
underloaded. An algorithm that can fairly distribute 
the request workload without causing replicas to os- 
cillate between being overloaded and underloaded is 
preferable. 

We describe each of the algorithms we evaluate 
in turn: 

Allocation Proportional to Inverse Load (Inv- 
Load). There are many load-balancing algorithms 
that base the allocation decision on the load ob- 
served at and reported by each of the serving enti- 
ties (see Related Work Section The representa- 
tive load-based algorithm we examine in this paper 
is Inv-Load, based on the algorithm presented by 



Geneva et al. | |GCOO| ]. In this algorithm, each peer 
node in the network chooses to forward a request to 
a replica with probability inversely proportional to 
the load reported by the replica. This means that the 
replica with the smallest reported load (as of the last 
report received) will receive the most requests from 
the node. Load is defined as the number of request 
arrivals at the replica per time unit. Other possible 
load metrics include the number of request connec- 
tions open at the replica at reporting time [ ABOO| ] or 



the request queue length at the replica [Dah99|. 

The Inv-Load algorithm has been shown to per- 
form as well as or better than other proposed algo- 
rithms in a homogeneous environment and for this 
reason we focus on this algorithm in this study. But, 
as we show in Section ??, Inv-Load does not handle 
node heterogeneity well. 



As we will see in Section |IV-A| , Inv-Load is not 
designed to handle replica node heterogeneity. 

Allocation Proportional to Available Capacity 
(Avail-Cap). In this algorithm, each peer node 
chooses to forward a request to a replica with proba- 
bility proportional to the available capacity reported 
by the replica. Available capacity is the maximum 
request rate a replica can handle minus the load (ac- 
tual request rate) experienced at the replica. This 
algorithm is based on the algorithm proposed by 
Zhu et al. | |ZYZ+98| ] for load sharing in a clus- 
ter of heterogeneous servers. Avail-Cap takes into 
account heterogeneity because it distinguishes be- 
tween nodes that experience the same load but have 
different maximum capacities. 
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Intuitively, Avail-Cap seems like it should work; 
it handles heterogeneity by sending more requests to 
the replicas that are currently more capable. Repli- 
cas that are overloaded report an available capacity 
of zero and are excluded from the allocation deci- 
sion until they once more report a positive available 
capacity. Unfortunately, as we will show in Section 



IV-B, this exclusion can cause Avail-Cap to suffer 



from wild load oscillations. 

Both Inv-Load and Avail-Cap implicitly assume 
that the load or available capacity reported by a 
replica remains roughly constant until the next re- 
port. Since both these metrics are directly affected 
by changes in the request workload, both algorithms 
require that replicas periodically update their LBI. 
(We assume replicas are not synchronized in when 
they send reports.) Decreasing the period between 
two consecutive LBI updates increases the timeli- 
ness of the LBI at a cost of higher overhead (in num- 
ber of updates pushed through the peer-to-peer net- 
work). In large peer-to-peer networks, there may be 
several levels in the CUP tree down which updates 
will have to travel, and the time to do so could be on 
the order of seconds. 

Allocation Proportional to Maximum Capacity 
(Max-Cap). This is the algorithm we propose. In 
this algorithm, each peer node chooses to forward 
a request to a replica with probability proportional 
to the maximum capacity of the replica. The max- 
imum capacity is a contract each replica advertises 
indicating the number of requests the replica claims 
to handle per time unit. Unlike load and available 
capacity, the maximum capacity of a replica is not 
affected by changes in the content request workload. 
Therefore, Max-Cap does not depend on the timeli- 
ness of the LBI updates. In fact, replicas only push 
updates down the CUP tree when they choose to ad- 
vertise a new maximum capacity. This choice de- 
pends on extraneous factors that are unrelated to and 
independent of the workload (see Section [V-D| ). If 
replicas rarely choose to change contracts, Max-Cap 
incurs near-zero overhead. We believe that this in- 
dependence of the timeliness and frequency of up- 
dates makes Max-Cap practical and elegant for use 
in peer-to-peer networks. 



IV. Experiments 

In this section we describe experiments that mea- 
sure the ability of the Inv-Load, Avail-Cap and 
Max-Cap algorithms to balance requests for con- 
tent fairly across the replicas holding the con- 
tent. We simulate a content-addressable network 



(CAN) [ RFH+Olp using the Stanford Narses sim- 
ulator [ ]MGB01 ]. A CAN is an example of a struc- 
tured peer-to-peer network, defined in Section |^ In 
each of these experiments, requests for a specific 
piece of content are posted at nodes throughout the 
CAN network for 3000 seconds. Using the CUP 
protocol described in Section ||, a node that receives 
a content request from a local client retrieves a set 
of index entries pointing to replica nodes that serve 
the content. The node applies a load-balancing al- 
gorithm to choose one of the replica nodes. It then 
points the local client making the content request at 
the chosen replica. 

The simulation input parameters include: the 
number of nodes in the overlay peer-to-peer net- 
work, the number of replica nodes holding the con- 
tent of interest, the maximum capacities of the 
replica nodes, the distribution of content request 
inter-arrival times, a seed to feed the random num- 
ber generators that drive the content request arrivals 
and the allocation decisions of the individual nodes, 
and the LBI update period, which is the amount of 
time each replica waits before sending the next LBI 
update for the Inv-Load and Avail-Cap algorithms. 

We assign maximum capacities to replica nodes 
by applying results from recent work that measures 
the upload capabilities of nodes in Gnutella net- 
works [ ]SGG02 |. This work has found that for the 
Gnutella network measured, around 10% of nodes 
are connected through dial-up modems, 60% are 
connected through broadband connections such as 
cable modem or DSL where the upload speed is 
about ten times that of dial-up modems, and the 
remaining 30% have high-end connections with 
upload speed at least 100 times that of dial-up 
modems. Therefore we assign maximum capacities 
of 1, 10, and 100 requests per second to nodes with 
probabilty of 0.1, 0.6, and 0.3, respectively. 

In all the experiments we present in this paper, 
the number of nodes in the network is 1024, each 
individually deciding how to distribute its incoming 
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content requests across the replica nodes. We use 
both Poisson and Pareto request inter-arrival distri- 
butions, both of which have been found to hold in 
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peer-to-peer networks [Cao02], [Mar02] 



We present five experiments. First we show that 
Inv-Load cannot handle heterogeneity. We then 
show that while Avail-Cap takes replica heterogene- 
ity into account, it can suffer from significant load 
oscillations caused by even small fluctuations in the 
workload. We compare Max-Cap with Avail-Cap 
for both Poisson and bursty Pareto arrivals. We also 
compare the effect on the performances of Avail- 
Cap and Max-Cap when replicas continuously enter 
and leave the system. Finally, we study the effect on 
Max-Cap when replicas cannot always honor their 
advertised maximum capacities because of signifi- 
cant extraneous load. 

A. Inv-Load and Heterogeneity 

In this experiment, we examine the performance of 
Inv-Load in a heterogeneous peer-to-peer environ- 
ment. We use a fairly short inter-update period of 
one second, which is quite aggressive in a large 
peer-to-peer network. We have ten replica nodes 
that serve the content item of interest, and we gen- 
erate request rates for that item according to a Pois- 
son process with an arrival rate that is 80% of the 
total maximum capacities of the replicas. Under 
such a workload, a good load-balancing algorithm 
should be able to avoid overloading some replicas 
while underloading others. Figure ^ shows a scat- 
terplot of how the utilization of each replica pro- 
ceeds with time when using Inv-Load. We define 
utilization as the request arrival rate observed by 
the replica divided by the maximum capacity of the 
replica. In this graph, we do not distinguish among 
points of different replicas. We see that throughout 
the simulation at any point in time, some replicas 
are severely overutilized (over 250%) while others 
are lightly underutilized (around 25%). 

Figure |3| shows for each replica, the percentage 
of all received requests that arrive while the replica 
is overloaded. This measurement gives a true pic- 
ture of how well a load-balancing algorithm works 
for each replica. In Figure ??b, the replicas that 
receive almost 100% of their requests while over- 
loaded (i.e., replicas 0-6) are the low and middle- 
end replicas. The repUcas that receive almost no 
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requests while overloaded (i.e., replicas 7-9) are the 
high-end replicas. We see that Inv-Load penalizes 
the less capable rephcas while giving the high-end 
replicas an easy time. 

Inv-Load is designed to perform well in a homo- 
geneous environment. When applied in a heteroge- 
neous environment such as a peer-to-peer network, 
it fails. As we will see in the next section Max- 
Cap is much better suited. Apart from showing that 
Max-Cap has comparable load balancing capability 
with no overhead in a homogeneous environment 
(see Appendix), we do not consider Inv-Load in the 
remaining experiments as our focus here is on het- 
erogeneous environments. 
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B. Avail-Cap versus Max-Cap 

In this set of experiments we examine the perfor- 
mance of Avail-Cap and compare it with Max-Cap. 

1 ) Poisson Request Arrivals: In Figures ^ and 
^ we show the replica utilization versus time for an 
experiment with ten replicas with a Poisson request 
arrival rate of 80% the total maximum capacities of 
the replicas. For Avail-Cap, we use an inter-update 
period of one second. For Max-Cap, this parame- 
ter is inapplicable since replica nodes do not send 
updates unless they experience extraneous load (see 
Section IV-DD . We see that Avail-Cap consistently 
overloads some replicas while underloading others. 
In contrast, Max-Cap tends to cluster replica utiliza- 
tion at around 80%. We ran this experiment with 
a range of Poisson lambda rates and found similar 
results for rates that were 60-100% the total maxi- 
mum capacities of the replicas. Avail-Cap consis- 
tently overloads some replicas while underloading 
others whereas Max-Cap clusters replica utilization 
at around X% utilization, where X is the overall re- 
quest rate divided by the total maximum capacities 
of the replicas. 

It turns out that in Avail-Cap, unlike Inv-Load, 
it is not the same replicas that are consistently 
overloaded or underloaded throughout the experi- 
ment. Instead, from one instant to the next, indi- 
vidual replicas oscillate between being overloaded 
and severely underloaded. 

We can see a sampling of this oscillation by look- 
ing at the utilizations of some individual replicas 
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over time. In Figures |611, we plot the utilization 
over a one minute period in the experiment for a rep- 
resentative replica from each of the replica classes 
(low, medium, and high maximum capacity). We 
also plot the ratio of the overall request rate to the 
total maximum capacities of the replicas and the 
line y = 1 showing 100% utilization. We see that 
for all replica classes, Avail-Cap suffers from sig- 
nificant oscillation when compared with Max-Cap 
which causes little or no oscillation. This behavior 
occurs throughout the experiment. 

Figures |l^ and 13 show the percentage of re- 
quests that arrive at each replica while the replica 
is overloaded for Avail-Cap and Max-Cap respec- 
tively. We see that Max-Cap achieves much lower 
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percentages than Avail-Cap. 

We also see in Figure 13 that Max-Cap ex- 
hibits a step-like behavior where the low-capacity 
replica (replica 1) is overloaded for about 35% of 
its queries, the middle-capacity replicas (replicas 
and 2-6) are each overloaded for about 14% of their 
queries, and the high-capacity replicas (replicas 7-9) 
are each overloaded for about 0. 1 % of their queries. 
To verify that this step effect is not a random coinci- 
dence, we ran a series of experiments, with ten repli- 
cas per experiment, and Poisson arrivals of 80% 
the total maximum capacity, each time varying the 
seed fed to the simulator. In Figure 0, we show 
the overloaded percentages for ten of these exper- 
iments. On the x-axis we order replicas according 
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to maximum capacity, with the low-capacity reph- 
cas plotted first (replica IDs 1 through 10), followed 
by the middle-capacity replicas (replica IDs 11-70), 
followed by the high-capacity replicas (replica IDs 
71-100). From the figure we see that the step behav- 
ior consistently occurs. This step behavior occurs 
because the lower-capacity replicas have less toler- 
ance for noise in the random coin tosses the nodes 
perform while assigning requests. They also have 
less tolerance for small fluctuations in the request 
rate. As a result, lower-capacity replicas are over- 
loaded more easily than higher-capacity replicas. 



Figure [12| shows that Avail-Cap with an inter- 
update period of one second causes much higher 



percentages than Max-Cap (more than twice as high 
for the medium and high-end replicas). Avail-Cap 
also causes fairly even overloaded percentages at 
around 40%. Again, to verify this evenness, in Fig- 
ure 15, we show for a series of ten experiments. 



the percentage of requests that arrive at each replica 
while the replica is overloaded. We see that Avail- 
Cap consistently achieves roughly even percentages 
(at around 40%) across all replica types in contrast 
to the step effect observed by Max-Cap. This can be 
explained by looking at the oscillations observed by 
replicas in Figures 0-11. In Avail-Cap, each replica 



is overloaded for roughly the same amount of time 
regardless of whether it is a low, medium or high- 
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Percentage Overload Queries versus Replica 
ID FOR Avail-Cap with inter-update period of 
10 seconds, for ten experiments. 



capacity replica. Tliis means that wliile eacli replica 
is getting the correct proportion of requests, it is re- 
ceiving them at the wrong time and as a result all 
the replicas experience roughly the same overloaded 
percentages. In Max-Cap, we see that replicas with 
lower maximum capacity are overloaded for more 
time that higher-capacity replicas. Consequently, 
higher-capacity replicas tend to have smaller over- 
load percentages than lower-capacity replicas. 

The performance of Avail-Cap is highly depen- 
dent on the inter-update period used. We find that 
as we increase the period and available capacity up- 
dates grow more stale, the performance of Avail- 
Cap suffers more. As an example, in Figure 16, 
we show the overloaded query percentages in the 
same series of ten experiments for Avail-Cap with a 
period of ten seconds. The overloaded percentages 
jump up to about 80% across the replicas. 

In a peer-to-peer environment, we argue that 
Max-Cap is a more practical choice than Avail-Cap. 
First, Max-Cap typically incurs no overhead. Sec- 
ond, Max-Cap can handle workload rates that are 
below 100% the total maximum capacities and can 
handle small fluctuations in the workload as are typ- 
ical in Poisson arrivals. 

A question remaining is how do Avail-Cap and 
Max-Cap compare when workload rates fluctuate 
beyond the total maximum capacities of the repli- 
cas? Such a scenario can occur for example when 
requests are bursty, as when inter-request arrival 



times follow a Pareto distribution. We examine 
Pareto arrivals next. 

2) Pareto Request Arrivals: Recent work has 
observed that in some peer-to-peer networks, re- 
quest inter-arrivals exhibit burstiness on several 



time scales [ Mar02 |, making the Pareto distribution 
a good candidate for modeling these inter-arrival 
times. 

The Pareto distribution has two parameters as- 
sociated with it: the shape parameter a > and 
the scale parameter «; > 0. The cumulative dis- 
tribution function of inter-arrival time durations is 
F{x) = 1 — ( (^x+k) distribution is heavy- 

tailed with unbounded variance when a < 2. For 
a > 1, the average number of query arrivals per 



time unit is equal to 



(a-l) 



For a <= 1, the expec- 



tation of an inter-arrival duration is unbounded and 
therefore the average number of query arrivals per 
time unit is 0. 

Typically, Pareto request arrivals are character- 
ized by frequent and intense bursts of requests fol- 
lowed by idle periods of varying lengths. During the 
bursts, the average request arrival rate can be many 
times the total maximum capacities of the replicas. 
We present a representative experiment in which a 
and AC are 1.1 and 0.000346 respectively. These par- 
ticular settings cause bursts of up to 230% the to- 
tal maximum capacities of the replicas. With such 
intense bursts, no load-balancing algorithm can be 
expected to keep replicas underloaded. Instead the 
best an algorithm can do is to have the oscillation 
observed by each replica's utilization match the os- 
cillation of the ratio of overall request rate to total 
maximum capacities. 

In Figures |l7-22 we plot the same representative 
replica utilizations over a one minute period in the 
experiment. We also plot the ratio of the overall re- 
quest rate to the total maximum capacities as well 
as the y = 100% utilization line. From the figures 
we see that Avail-Cap suffers from much wilder os- 
cillation than Max-Cap, causing much higher peaks 
and lower valleys in replica utilization than Max- 
Cap. Moreover, Max-Cap adjusts better to the fluc- 
tuations in the request rate; the utilization curves for 
Max-Cap tend to follow the ratio curve more closely 
than those for Avail-Cap. 

(Note that idle periods contribute to the drops in 
utilization of replicas in this experiment. For exam- 
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pie, an idle period occurs between times 324 and 
332 at which point we see a decrease in both the 
ratio and the replica utilization.) 

S) Why Avail-Cap Can Suffer: From the ex- 
periments above we see that Avail-Cap can suffer 
from severe oscillation even when the overall re- 
quest rate is well below (e.g., 80%) the total max- 
imum capacities of the replicas. The reason why 
Avail-Cap does not balance load well here is that a 
vicious cycle is created where the available capac- 
ity update of one rephca affects a subsequent up- 
date of another rephca. This in turn affects later 
allocation decisions made by nodes which in turn 
affects later replica updates. This description be- 
comes more concrete if we consider what happens 
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when a replica is overloaded. 

In Avail-Cap, if a replica becomes overloaded, it 
reports an available capacity of zero. This report 
eventually reaches all peer nodes, causing them to 
stop redirecting requests to the replica. The exclu- 
sion of the overloaded replica from the allocation 
decision shifts the entire burden of the workload 
to the other replicas. This can cause other repli- 
cas to overload and report zero available capacity 
while the excluded replica experiences a sharp de- 
crease in its utilization. This sharp decrease causes 
the replica to begin reporting positive available ca- 
pacity which begins to attract requests again. Since 
in the meantime other replicas have become over- 
loaded and excluded from the allocation decision, 
the replica receives a flock of requests which cause 
it to become overloaded again. As we observed 
in previous sections, a replica can experience wild 
and periodic oscillation where its utilization contin- 
uously rises above its maximum capacity and falls 
sharply. 

In Max-Cap, if a replica becomes overloaded, the 
overload condition is confined to that replica. The 
same is true in the case of underloaded replicas. 
Since the overload/underload situations of the repli- 
cas aie not reported, they do not influence follow-up 
LBI updates of other replicas. It is this key property 
that allows Max-Cap to avoid herd behavior. 

There are situations however where Avail-Cap 
performs well without suffering from oscillation 



that affect the performance of Avail-Cap to get a 
clearer picture of when the reactive nature of Avail- 
Cap is beneficial (or at least not harmful) and when 
it causes oscillation. 

4) Factors Affecting Avail-Cap: There are four 
factors that affect the performance of Avail-Cap: the 
inter-update period U, the inter-request period R, 
the amount of time T it takes for all nodes in the 
network to receive the latest update from a replica, 
and the ratio of the overall request rate to the total 
maximum capacities of the replicas. We examine 
these factors by considering three cases: 

Case 1: U is much smaller than R {U << R), 
and T is sufficiently small so that when a replica 
pushes an update, all nodes in the CUP tree receive 
the update before the next request arrival in the net- 
work. In this case, Avail-Cap performs well since 
all nodes have the latest load-balancing information 
whenever they receive a request. 

Case 2: U is long relative to R{U > R) and the 
overall request rate is less than about 60% the to- 
tal maximum capacities of the replicas. (This 60% 
threshold is specific to the particular configuration 
of replicas we use: 10% low, 60% medium, 30% 
high. Other configurations have different threshold 
percentages that are typically well below the total 
maximum capacities of the replicas.) In this case, 
when a particular replica overloads, the remaining 
replicas are able to cover the proportion of requests 
intended for the overloaded replica because there is 
a lot of extra capacity in the system. As a result, 
Avail-Cap avoids oscillations. We see experimental 
evidence for this in Section [V-C. However, over- 



(see Section IV-C). We next describe the factors 



provisioning to have enough extra capacity in the 
system so that Avail-Cap can avoid oscillation in 
this particular case seems a high price to pay for 
load stability. 

Case 3: U is long relative to i? ([/ > R) and 
the overall request rate is more than about 60% the 
total maximum capacities of the replicas. In this 
case, as we observe in the experiments above, Avail- 
Cap can suffer from oscillation. This is because ev- 
ery request that arrives directly affects the available 
capacity of one of the replicas. Since the request 
rate is greater than the update rate, an update be- 
comes stale shortly after a replica has pushed it out. 
However, the replica does not inform the nodes of 
its changing available capacity until the end of its 
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current update period. By that point many requests 
have arrived and have been allocated using the pre- 
vious, stale available capacity information. 

In Case 3, Avail-Cap can suffer even if T = 
and updates were to arrive at all nodes immediately 
after being issued. This is because all nodes would 
simultaneously exclude an overloaded replica from 
the allocation decision until the next update is is- 
sued. As T increases, the staleness of the report 
only exacerbates the performance of Avail-Cap. 

In a large peer-to-peer network (more than 1000 
nodes) we expect that T will be on the order of 
seconds since current peer-to-peer networks with 
more than 1000 nodes have diameters ranging from 



a handful to several hops JRF02| ]. We consider U 
= 1 second to be as small (and aggressive) an inter- 
update period as is practical in a peer-to-peer net- 
work. In fact even one second may be too aggres- 
sive due to the overhead it generates. This means 
that when particular content experiences high popu- 
larity, we expect that typically U + T » R. Under 
such circumstances Avail-Cap is not a good load- 
balancing choice. For less popular content, where 
U + T < R, Avail-Cap is a feasible choice, al- 
though it is unclear whether load-balancing across 
the replicas is as urgent here, since the request rate 
is low. 

The performance of Max-Cap is independent of 
the values of U, R, and T. More importantly, Max- 
Cap does not require continuous updates; replicas 
issue updates only if they choose to re-issue new 
contracts to report changes in their maximum ca- 
pacities. (See Section |IV-D ). Therefore, we believe 
that Max-Cap is a more practical choice in a peer- 
to-peer context than Avail-Cap. 

C. Dynamic Replica Set 

A key characteristic of peer-to-peer networks is that 
they are subject to constant change; peer nodes con- 
tinuously enter and leave the system. In this exper- 
iment we compare Max-Cap with Avail-Cap when 
replicas enter and leave the system. We present re- 
sults here for a Poisson request arrival rate that is 
80% the total maximum capacities of the repUcas. 

We present two dynamic experiments. In both ex- 
periments, the network starts with ten replicas and 
after a period of 600 seconds, movement into and 
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Replica Utilization versus Time for 
Avail-Cap with a dynamic replica set. One 

REPLICA enters AND LEAVES EVERY 60 SECONDS. 



out of the network begins. In the first experiment, 
one replica leaves and one replica enters the net- 
work every 60 seconds. In the second and much 
more dynamic experiment, five replicas leave and 
five replicas enter the network every 60 time units. 
The replicas that leave are randomly chosen. The 
replicas that enter the network enter with maximum 
capacities of 1, 10, and 100 with probability of 0. 10, 
0.60, and 0.30 respectively as in the initial alloca- 
tion. This means that the total maximum capacities 
of the active replicas in the network varies through- 
out the experiment, depending on the capacities of 
the entering replicas. 

Figures El and 24 show for the first dynamic ex- 



periment the utilization of active replicas through- 
out time as observed for Avail-Cap and Max-Cap. 
Note that points with zero utilization indicate newly 
entering replicas. The jagged line plots the ratio of 
the current sum of maximum capacities in the net- 
work, Scurr, to the original sum of maximum ca- 
pacities, Sorig- With each change in the replica set, 
the replica utilizations for both Avail-Cap and Max- 
Cap change. Replica utilizations rise when Scurr 
falls and vice versa. 

From the figure we see that between times 1000 
and 1820, Scurr is between 1.75 and 2 times Sorig, 
and is more than double the overall workload rate of 
0.8 * Sorig- During this time period, Avail-Cap per- 
forms quite well because the workload rate is not 
very demanding and there is plenty of extra capac- 
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Replica Utilization versus Time for Max-Cap 
with a dynamic replica set. one replica 
enters and leaves every 60 seconds. 
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ity in the system (Case 2 above). However, when at 
time 1940 Scurr falls back to Sorig, we see that both 
algorithms exhibit the same behavior as they do at 
the Start, between times and 600. Max-Cap read- 
justs nicely and clusters replica utilization at around 
80%, while Avail-Cap starts to suffer again. 

show for the first dynamic 



Figures g5| and 
experiment the percentage of queries that were re- 
ceived by each replica while the replica was over- 
loaded for Avail-Cap and Max-Cap. Replicas that 
entered and departed the network throughout the 
simulation were chosen from a pool of 50 replicas. 
Those replicas in the pool which did not participate 
in this experiment do not have a bar associated with 
their ID in the figure. From the figure, we see that 
Max-Cap achieves smaller overload query percent- 
ages across all replica IDs. 



Figures g7| and ^ show the utilization scatterplot 



and Figures g9| and ^ show the overloaded query 
percentage for the second dynamic experiment. We 
see that changing half the replicas every 60 seconds 
can dramatically affect Scurr- For example, when 
Scurr drops to 0.2Sorig at time 2161, we see the uti- 
lizations rise dramatically for both Avail-Cap and 
Max-Cap. This is because during this period the 
workload rate is four times that of Scurr- However 
by time 2401, Scurr has risen to 1.2Sorig which al- 
lows for both Avail-Cap and Max-Cap to adjust and 
decrease the replica utilization. At the next replica 
set change at time 2461, Scurr equals Sorig- Dur- 
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Percentage Overloaded Queries versus 
Replica ID for Max-Cap with a dynamic 
REPLICA set. One replica enters and leaves 

EVERY 60 SECONDS. 



ing the next minute we see that Max-Cap overloads 
very few replicas whereas Avail-Cap does not re- 
cuperate as well. Similarly, when examining the 
overloaded query percentage we see that Max-Cap 
achieves smaller percentages when compared with 
Avail-Cap. 

The two dynamic experiments we have described 
above show two things; first, when the workload 
is not very demanding and there is unused capac- 
ity, the behaviors of Avail-Cap and Max-Cap are 
similar However, Avail-Cap suffers more as over- 
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Replica Utilization versus Time for 
Avail-Cap with a dynamic replica set. Half 

THE replicas ENTER AND LEAVE EVERY 60 
SECONDS. 
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Replica Utilization versus Time for Max-Cap 

with a DYNAMIC REPLICA SET. HALF THE REPLICAS 
enter and leave every 60 SECONDS. 



all available capacity decreases. Second, Avail-Cap 
is affected more by short-lived fluctuations (in par- 
ticular, decreases) in total maximum capacity than 
Max-Cap. This is because the reactive nature of 
Avail-Cap causes it to adapt abruptly to changes in 
capacities, even when these changes are short-lived. 

D. Extraneous Load 

When repUcas can honor their maximum capacities, 
Max-Cap avoids the oscillation that Avail-Cap can 
suffer, and does so with no update overhead. Oc- 
casionally, some replicas may not be able to honor 
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their maximum capacities because of extraneous 
load caused by other applications running on the 
replicas or network conditions unrelated to the con- 
tent request workload. 

To deal with the possibility of extraneous load, 
we modify the Max-Cap algorithm slightly to work 
with honored maximum capacities. A rephca's hon- 
ored maximum capacity is its maximum capacity 
minus the extraneous load it is experiencing. The 
algorithm changes slightly; a peer node chooses a 
replica to which to forward a content request with 
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probability proportional to the honored maximum 
capacity advertised by the replica. This means 
that replicas may choose to send updates to indi- 
cate changes in their honored maximum capacities. 
However, as we will show, the behavior of Max-Cap 
is not tied to the timeliness of updates in the way 
Avail-Cap is. 

We view the honored maximum capacity reported 
by a replica as a contract. If the repUca cannot ad- 
here to the contract or has extra capacity to give, 
but does not report the deficit or surplus, then that 
replica alone will be affected and may be overloaded 
or underloaded since it will be receiving a request 
share that is proportional to its previous advertised 
honored maximum capacity. 

If, on the other hand, a replica chooses to issue 
a new contract with the new honored maximum ca- 
pacity, then this new update can affect the load bal- 
ancing decisions of the nodes in the peer network 
and the workload could shift to the other replicas. 
This shift in workload is quite different from that ex- 
perienced by Avail-Cap when a replica reports over- 
load and is excluded. The contracts of any other 
replica will not be affected by this workload shift. 
Instead, the contract is are solely affected by the 
extraneous load that replica experiences which is 
independent of the extraneous load experienced by 
the replica issuing the new contract. This is unlike 
Avail-Cap where the available capacity reported by 
one replica directly affects the available capacities 
of the others. 

In this section we study the performance of Max- 
Cap in an experiment where all replica nodes are 
continuously issuing new contracts. Specifically, 
for each of ten replicas, we inject extraneous load 
into the replica once a second. The extraneous load 
injected is randomly chosen to be anywhere be- 
tween 0% and 50% of the replica's original maxi- 



mum capacity. Figures |31| and |32| show the replica 
utilization versus time and the overloaded query 
percentages for Max-Cap with an inter-update pe- 



riod of 1 second. The jagged line in Figure 3 1 shows 
the total honored maximum capacities over time. 
Since throughout the experiment each replica's hon- 
ored maximum capacity varies between 50% and 
100% its original maximum capacity, the total max- 
imum capacity is expected to hover at around 75% 
the original total maximum capacity and we see that 
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Replication Utilization versus Time for 
Max-Cap with extraneous load and an 
inter-update period of one second. 
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Percentage Overloaded Queries versus 
Replica ID, Max-Cap with extraneous load 

AND AN INTER-UPDATE PERIOD OF ONE SECOND. 



the jagged line hovers around this value. We there- 
fore generate Poisson request arrivals with an aver- 
age rate that is 80% of this value to keep consistent 
with our running example of 80% workload rates. 

From the figures, we see that Max-Cap continues 
to cluster replica utilization at around 80%, but there 
are more overloaded replicas throughout time than 
when compared with the experiment in which all 
replicas adhere to their contracts all the time (Fig- 
ure H). We also see that the overloaded percentages 



are higher than before (Figure 13). The reason for 
this performance degradation is that the randomly 
injected load (of 0% to 50%) can cause sharp rises 
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Fig. 33 

Replication Utilization versus Time for 
Max-Cap with extraneous load and an 
inter-update period of ten seconds. 



and falls in the reported contract of each replica 
from one second to the next. Since the change is so 
rapid, and updates take on the order of seconds to 
reach all allocating nodes, allocation decisions are 
continuously being made using stale information. 

In the next experiment we use the same param- 
eters as above but we change the update period to 
10 seconds. Figures 51 and 34 show the utilization 



and overloaded percentages for this experiment. We 
see that the overloaded percentages increase only 
slightly while the overhead of pushing the updates 
decreases by a factor of ten. In contrast, when we 
perform the same experiment for Avail-Cap, we find 
that the overloaded query percentages for Avail-Cap 
increase from about 55 to more than 80% across all 
the replicas when the inter-update period changes 
from 1 to 10 seconds. However, this performance 
degradation is not so much due to the fluctuation of 
the extraneous load as it is due to Avail-Cap's ten- 
dency to oscillate when the request rate is greater 
than the update rate. 

We purposely choose this scenario to test how 
Max-Cap performs under widely fluctuating extra- 
neous load on every replica. We generally expect 
that extraneous load will not fluctuate so wildly, nor 
will all replicas issue new contracts every second. 
Moreover, we expect the inter-update period to be 
on the order of several seconds or even minutes, 
which further reduces overhead. 

We can view the effect of extraneous load on the 
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Percentage Overloaded Queries versus 
Replica ID for Max-Cap with extraneous 
load and an inter-update period of ten 
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performance of Max-Cap as similar to that seen in 
the dynamic replica experiments. When a replica 
advertises a new honored maximum capacity, it is 
as if that replica were leaving and being replaced by 
a new replica with a different maximum capacity. 



V. Related Work 

Load-balancing has been the focus of many studies 
described in the distributed systems literature. We 
first describe load-balancing techniques that could 
be applied in a peer-to-peer context. We classify 
these into two categories, those algorithms where 
the allocation decision is based on load and those 
where the allocation decision is based on available 
capacity. We then describe other load-balancing 
techniques (such as process migration) that cannot 
be directly applied in a peer-to-peer context. 

A. Load-Based Algorithms 

Of the load-balancing algorithms based on load, 
a very common approach to performing load- 
balancing is to choose the server with the least re- 
ported load from among a set of servers. This 
approach performs well in a homogeneous system 
where the task allocation is performed by a single 
centralized entity (dispatcher) which has complete 



up-to-date load information [ Web78], [ Win77]. In 
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a system where multiple dispatchers are indepen- 
dently performing the allocation of tasks, this ap- 
proach however has been shown to behave badly, 
especially if load information used is stale [ ELZ86p , 
[ |MTS89D , | pit97| ], [ |SKS92| ]. Mitzenmacher talks 
about the "herd behavior" that can occur when 
servers that have reported low load are inundated 
with requests from dispatchers until new load infor- 



mation is reported [ ]Mit97| ]. 

Dahlin proposes load interpretation algorithms 
[ pah99 |. These algorithms take into account the 
age (staleness) of the load information reported by 
each of a set of distributed homogeneous servers as 
well as an estimate of the rate at which new requests 
arrive at the whole system to determine to which 
server to allocate a request. 

Many studies have focused on the strategy of us- 
ing a subset of the load information available. This 
involves first randomly choosing a small number, 
k, of homogeneous servers and then choosing the 



least loaded server from within that set [Mit96], 



[ pLZ86| |, [ ]VDK96| ], [ |ABKU94| |, [ ]KLH92p . In par- 
ticular, for homogeneous systems, Mitzenmacher 
[ |VIit96 ] studies the tradeoffs of various choices of 
k and various degrees of staleness of load informa- 
tion reported. As the degree of staleness increases, 
smaller values of k are preferable. 

Genova et al. [GCOO] propose an algorithm. 



which we call Inv-Load that first randomly selects 
k servers. The algorithm then weighs the servers by 
load information and chooses a server with proba- 
bility that is inversely proportional to the load re- 
ported by that server. When k = n, where n is the 
total number of servers, the algorithm is shown to 
perform better than previous load-based algorithms 
and for this reason we focus on this algorithm in this 
paper. 



As we see in Section [V-A , algorithms that base 
the decision on load do not handle heterogeneity. 

B. Available-Capacity-Based Algorithms 

Of the load-balancing algorithms based on avail- 
able capacity, one common approach has been 
to choose amongst a set of servers based on the 
available capacity of each server [ |Z:YZ+98| 1 or the 
available bandwidth in the network to each server 



pacify/bandwidth is chosen by a client with a re- 
quest. The assumption here is that the reported 
available capacity/bandwidth will continue to be 
valid until the chosen server has finished servicing 
the client's request. This assumption does not al- 
ways hold; external traffic caused by other applica- 
tions can invalidate the assumption, but more sur- 
prisingly the traffic caused by the application whose 
workload is being balanced can also invalidate the 



assumption. We see this in Section [V-B 



Another approach is to to exclude servers that fail 
some utilization threshold and to choose from the 
remaining servers. Mirchandaney et al. [ MTS90| ] 
and Shivaratri et al. [ |SKS92 ] classify machines as 
lightly-utilized or heavily-utilized and then choose 
randomly from the lightly-utilized servers. This 
work focuses on local-area distributed systems. Co- 
lajanni et al. use this approach to enhance round- 
robin DNS load-balancing across a set of widely 



distributed heterogeneous web servers [ |CYC98| ], 
Specifically, when a web server surpasses a utiliza- 
tion threshold it sends an alarm signal to the DNS 
system indicating it is out of commission. The 
server is excluded from the DNS resolution until it 
sends another signal indicating it is below thresh- 
old and free to service requests again. In this work, 
the maximum capacities of the most capable servers 
are at most a factor of three that of the least capable 
servers. 

when applied in 



As we see in Section I V-B 



[CC97]. The server with the highest available ca- 



the context of a peer-to-peer network where many 
nodes are making the allocation decision and where 
the maximum capacities of the replica nodes can 
differ by two orders of magnitude, excluding a serv- 
ing node temporarily from the allocation decision 
can result in load oscillation. 

C. Other Load-balancing Techniques 

We now describe load-balancing techniques that ap- 
pear in the literature but cannot be directly applied 
in a peer-to-peer context. 

There has been a large body of work devoted to 
the problem of load-balancing across a set of servers 
residing within a cluster. In some cluster systems 
there is one centralized dispatcher through which 
all incoming requests to the cluster arrive. The dis- 
patcher has fuU control over the allocation of re- 
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quests to servers [ DKMT9^ , ^i^. In other sys- 
tems there are multiple dispatchers that make the 
allocation decision. One common approach is to 
have front-end servers sit at the entrance of the clus- 
ter intercepting incoming requests and allocating re- 
quests to the back-end servers within the cluster that 



actually satisfy the requests [ |CDR99| ]. Still others 
have requests be evenly routed to servers within the 
cluster via DNS rotation (described below) or via 
a single IP-switch sitting at the front of the cluster 



(e.g., [ ]fou98| ]). Upon receiving a request each server 
then decides whether to satisfy the request or to dis- 
patch it to another server [ |ZYZ+98| ]. Some cluster 
systems have the dispatchers(s) poll each server or a 
random set of servers for load/availability informa- 
tion just before each allocation decision [ AAFL96| ], 
[ |SYC02 ]. Others have the dispatcher(s) periodically 
poll servers, while still others have servers period- 
ically broadcast their load-balancing information. 
Studies that compare the tradeoffs among these in- 
formation dissemination options within a cluster in- 
clude ||ZYZ+98|1, [1SYC02I]. 



Regardless of the way this information is ex- 
changed, cluster-based algorithms take advantage of 
the local-area nature of the cluster network to de- 
liver timely load-balancing updates. This character- 
istic does not apply in a peer-to-peer network where 
load-balancing updates may have to travel across 
the Internet. 

Most cluster algorithms assume that servers are 
homogeneous. The exceptions to this rule include 
work by Castro et al. [CDR9S]. This work as- 



sumes that servers will have different processing ca- 
pabilities and allows each server to stipulate a max- 
imum desirable utilization that is incorporated into 
the load-balancing algorithm. The algorithm they 
use assumes that servers are synchronized and send 
their load updates at the same time. This is not 
true in a peer-to-peer network where replicas can- 
not be synchronized. Zhu et al. [ |ZYZ+98 | as- 
sume servers are heterogeneous and use a metric 
that combines available disk capacity and CUP cy- 
cles to choose a server within the cluster to handle 
a task [ZYZ^98]. Their algorithm uses a combina- 



tion of random polling before selection and random 
multicasting of load-balancing information to a se- 
lect few servers. Both are techniques that would not 
scale in a large peer-to-peer network. 



Another well-studied load-balancing cluster ap- 
proach is to have heavily loaded servers hand- 
off requests they receive to other servers within 
the cluster that are less loaded or to have lightly 
loaded servers attempt to get tasks from heavily 



loaded servers (e.g., [Dan95], [SK90]). This can be 
achieved through techniques such as HTTP redirec- 
tion (e.g., VPCY9% [ ]AYI96|1 [ ]CCYO0| ]) or packet 
header rewriting (e.g., [ ABOO ]) or remote script ex- 
ecution [|ZYZ+98|]. HTTP redirection adds addi- 



tional client round-trip latency for every resched- 
uled request. TCP/IP hand-off and packet header 
rewriting require changes in the OS kernel or net- 
work interface drivers. Remote script execution re- 
quires trust between the serving entities. 

Similar to task handoff is the technique of pro- 
cess migration. Process migration to spread job load 
across a set of servers in a local-area distributed sys- 
tem has been widely studied both in the theoreti- 
cal literature as well as the systems literature (e.g., 
[ |D09lD , [ 1LM93I 1, [ |DHB95| 1, [ |PL95D , [ |:.L96| 1). In 
these systems overloaded servers migrate some of 
their currently running processes to lighter loaded 
servers in an attempt to achieve more equitable dis- 
tribution of work across the servers. 

Both task handoff and process migration require 
close coordination amongst serving entities that can 
be afforded in a tightly-coupled communication en- 
vironment such as a cluster or local-area distributed 
system. In a peer-to-peer network where the replica 
nodes serving the content may be widely distributed 
across the Internet, these techniques are not possi- 
ble. 

A lot of work has looked at balancing load across 
multi-server homogeneous web sites by leveraging 
the DNS service used to provide the mapping be- 
tween a web page's URL and the IP address of a 
web server serving the URL. Round-robin DNS was 
proposed, where the DNS system maps requests 



to web servers in a round-robin fashion [KBM94|, 



[ |AYHI96| ]. Because DNS mappings have a Time- 



to-Live (TTL) field associated with them and tend 
to be cached at the local name server in each do- 
main, this approach can lead to a large number 
of client requests from a particular domain getting 
mapped to the same web server during the TTL pe- 
riod. Thus, round-robin DNS achieves good balance 
only so long as each domain has the same cUent re- 
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quest rate. Moreover, round-robin load-balancing 
does not work in a heterogeneous peer-to-peer con- 
text because each serving replica gets a uniform 
rate of requests regardless of whether it can handle 
this rate. Work that takes into account domain re- 
quest rate improves upon round-robin DNS and is 
described by Colajanni et al. [ CYD97| ]. 



Colajanni et al. later extend this work to bal- 
ance load across a set of widely distributed het- 
erogeneous web servers [ pYC98| ]. This work pro- 
poses the use of adaptive TTLs, where the TTL for 
a DNS mapping is set inversely proportional to the 
domain's local client request rate for the mapping 
of interest (as reported by the domain's local name 
server). The TTL is at the same time set to be pro- 
portional to the chosen web server's maximum ca- 
pacity. So web servers with high maximum capac- 
ity will have DNS mappings with longer TTLs, and 
domains with low request rates will receive map- 
pings with longer TTLs. Max-Cap, the algorithm 
proposed in this thesis, also uses the maximum ca- 
pacities of the serving replica nodes to allocate re- 
quests proportionally. The main difference is that in 
the work by Colajanni et al., the root DNS sched- 
uler acts as a centralized dispatcher setting all DNS 
mappings and is assumed to know what the request 
rate in the requesting domain is like. In the peer- 
to-peer case the authority node has no idea what the 
request rate throughout the network is like, nor how 
large is the set of requesting nodes. 

Lottery scheduling is another technique that, like 
Max-Cap, uses proportional allocation. This ap- 
proach has been proposed in the context of resource 
allocation within an operating system (the Mach mi- 



crokernel) [WW94|. Client processes hold tickets 
that give them access to particular resources in the 
operating system. Clients are allocated resources 
by a centralized lottery scheduler proportionally to 
the number of tickets they own and can donate their 
tickets to other clients in exchange for tickets at a 
later point. Max-Cap is similar in that it allocates 
requests to a replica node proportionally to the max- 
imum capacity of the replica node. The main differ- 
ence is that in Max-Cap the allocation decision is 
completely distributed with no opportunity for ex- 
change of resources across repUca nodes. 



VI. Conclusions 

In this paper we examine the problem of load- 
balancing in a peer-to-peer network where the goal 
is to distribute the demand for a particular content 
fairly across the set of replica nodes that serve that 
content. Existing load-balancing algorithms pro- 
posed in the distributed systems literature are not 
appropriate for a peer-to-peer network. We find 
that load-based algorithms do not handle the hetero- 
geneity that is typical in a peer-to-peer network. We 
also find that algorithms based on available capacity 
reports can suffer from load oscillations even when 
the workload request rate is as low as 60% of the 
total maximum capacities of replicas. 

We propose and evaluate Max-Cap, a practical 
algorithm for load-balancing. Max-Cap handles 
heterogeneity, yet does not suffer from oscillations 
when the workload rate is below 100% of the total 
maximum capacities of the replicas, adjusts better 
to very large fluctuations in the workload and con- 
stantly changing replica sets, and incurs less over- 
head than algorithms based on available capacity 
since its reports are affected only by extraneous load 
on the replicas. We believe this makes Max-Cap a 
practical and elegant algorithm to apply in peer-to- 
peer networks. 
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Appendix 

It should not surprise the reader that Inv-Load does not 
handle heterogeneity since the same load at one replica 
may have a different effect on another with a different 
maximum capacity. However, surprisingly it turns out 
that when replicas are homogeneous, the performance of 
Inv-Load and Max-Cap are comparable. 

In this set of experiments, there are ten replicas, each 
of whose maximum capacity we set at 10 requests per 
second for a total maximum capacity of 100 requests per 
second. Queries are generated according to a Poisson 
process with a lambda rate that is 80% the total maxi- 
mum capacities of the replicas. 

Figures ^ and |6| show a scatterplot of how the uti- 
lization of each replica proceeds with time when using 
Inv-Load with a refresh period of one time unit and Max- 
Cap respectively. Inv-Load and Max-Cap have similar 
scatterplots. 

Figures BTland 38 show for each replica, the percent- 



age of queries that arrived at the replica while the replica 
was overloaded. Again, we see that Inv-Load and Max- 
Cap have comparable performance. 

The difference is that Inv-Load incurs the extra over- 
head of one load update per replica per second. In a 
CUP tree of 100 nodes this translates to 1000 updates 
per second being pushed down the CUP tree. In a tree 
of 1000 nodes this translates to 10000 update per second 
being pushed. Thus, the larger the CUP tree, the larger 
the overall network overhead. The overhead incurred by 
Inv-Load could be reduced by increasing the period be- 
tween two consecutive updates at each replica. Increas- 
ing the period results in staler load updates. We find that 
when experimenting with a range of pe riods (o ne to sixty 
seconds), we confirm earlier studies | Mit97 1 that have 



found that as load information becomes more stale with 
increasing periods, the performance of load-based bal- 
ancing algorithms decreases. 

We ran experiments with Pareto(Q, k) query interar- 
rivals with a wide range of a and k values (the Pareto 
distribution shape and scale parameters) and found that 
with homogeneous replicas, Inv-Load with a period of 
one and Max-Cap continue to be comparable. However, 
Max-Cap is preferable in these cases because it incurs no 
overhead. 
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Fig. 35 

Replica Utilization versus Time for Inv-Load 
with an inter-update period of one second 
and homogeneous replicas. 
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Fig. 37 

Percentage Overload Queries versus Replica 
ID FOR Inv-Load with an inter-update period 
of one second and homogeneous replicas. 
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Fig. 36 

Replica Utilization versus Time for Max-Cap 

WITH homogeneous REPLICAS. 
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Fig. 38 

Percentage Overload Queries versus Replica 
ID FOR Max-Cap with homogeneous replicas. 
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